AITopics | fast computation

Collaborating Authors

fast computation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FlashBias Fast Computation of Attention with Bias

Neural Information Processing SystemsJun-15-2026, 11:52:26 GMT

Attention with bias, which extends standard attention by introducing prior knowledge as an additive bias matrix to the query-key scores, has been widely deployed in vision, language, protein-folding and other advanced scientific models, underscoring its status as a key evolution of this foundational module. However, introducing bias terms creates a severe efficiency bottleneck in attention computation. It disrupts the tightly fused memory-compute pipeline that underlies the speed of accelerators like FlashAttention, thereby stripping away most of their performance gains and leaving biased attention computationally expensive. Surprisingly, despite its common usage, targeted efficiency optimization for attention with bias remains absent, which seriously hinders its application in complex tasks. Diving into the computation of FlashAttention, we prove that its optimal efficiency is determined by the rank of the attention weight matrix. Inspired by this theoretical result, this paper presents FlashBias based on the low-rank compressed sensing theory, which can provide fast-exact computation for many widely used attention biases and a fast-accurate approximation for biases in general formalizations. FlashBias can fully take advantage of the extremely optimized matrix multiplication operation in modern GPUs, achieving 1.5 speedup for Pairformer in AlphaFold 3, and over 2 speedup for attention with bias in vision and language models without loss of accuracy. Code is available at this repository: https://github.com/thuml/FlashBias.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons

Neural Information Processing SystemsDec-24-2025, 12:27:04 GMT

The response time of physical computational elements is finite, and neurons are no exception. In hierarchical models of cortical networks each layer thus introduces a response lag. This inherent property of physical dynamical systems results in delayed processing of stimuli and causes a timing mismatch between network output and instructive signals, thus afflicting not only inference, but also learning. We introduce Latent Equilibrium, a new framework for inference and learning in networks of slow components which avoids these issues by harnessing the ability of biological neurons to phase-advance their output with respect to their membrane potential. This principle enables quasi-instantaneous inference independent of network depth and avoids the need for phased plasticity or computationally expensive network relaxation phases. We jointly derive disentangled neuron and synapse dynamics from a prospective energy function that depends on a network's generalized position and momentum. The resulting model can be interpreted as a biologically plausible approximation of error backpropagation in deep cortical networks with continuous-time, leaky neuronal dynamics and continuously active, local plasticity. We demonstrate successful learning of standard benchmark datasets, achieving competitive performance using both fully-connected and convolutional architectures, and show how our principle can be applied to detailed models of cortical microcircuitry. Furthermore, we study the robustness of our model to spatio-temporal substrate imperfections to demonstrate its feasibility for physical realization, be it in vivo or in silico.

fast computation, latent equilibrium, unified learning theory, (6 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

FlashBias: Fast Computation of Attention with Bias

Wu, Haixu, Guo, Minghao, Ma, Yuezhou, Sun, Yuanxu, Wang, Jianmin, Matusik, Wojciech, Long, Mingsheng

arXiv.org Artificial IntelligenceOct-27-2025

Attention with bias, which extends standard attention by introducing prior knowledge as an additive bias matrix to the query-key scores, has been widely deployed in vision, language, protein-folding and other advanced scientific models, underscoring its status as a key evolution of this foundational module. However, introducing bias terms creates a severe efficiency bottleneck in attention computation. It disrupts the tightly fused memory-compute pipeline that underlies the speed of accelerators like FlashAttention, thereby stripping away most of their performance gains and leaving biased attention computationally expensive. Surprisingly, despite its common usage, targeted efficiency optimization for attention with bias remains absent, which seriously hinders its application in complex tasks. Diving into the computation of FlashAttention, we prove that its optimal efficiency is determined by the rank of the attention weight matrix. Inspired by this theoretical result, this paper presents FlashBias based on the low-rank compressed sensing theory, which can provide fast-exact computation for many widely used attention biases and a fast-accurate approximation for biases in general formalizations. FlashBias can fully take advantage of the extremely optimized matrix multiplication operation in modern GPUs, achieving 1.5$\times$ speedup for Pairformer in AlphaFold 3, and over 2$\times$ speedup for attention with bias in vision and language models without loss of accuracy. Code is available at this repository: https://github.com/thuml/FlashBias.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.12044

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons

Neural Information Processing SystemsJan-17-2025, 15:44:13 GMT

latent equilibrium, neuron, unified learning theory, (3 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

arXiv.org Artificial IntelligenceAug-23-2024

The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these challenges, this paper introduces a novel fast computation method for gradient calculation in multi-layer transformer models. Our approach enables the computation of gradients for the entire multi-layer transformer model in almost linear time $n^{1+o(1)}$, where $n$ is the input sequence length. This breakthrough significantly reduces the computational bottleneck associated with the traditional quadratic time complexity. Our theory holds for any loss function and maintains a bounded approximation error across the entire model. Furthermore, our analysis can hold when the multi-layer transformer model contains many practical sub-modules, such as residual connection, casual mask, and multi-head attention. By improving the efficiency of gradient computation in large language models, we hope that our work will facilitate the more effective training and deployment of long-context language models based on our theoretical results.

denote, gradient, poly, (14 more...)

arXiv.org Artificial Intelligence

2408.13233

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression

Kanagawa, Motonobu

arXiv.org Machine LearningMay-8-2024

We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.

k-nn regression, nearest neighbour, regression, (12 more...)

arXiv.org Machine Learning

2405.04919

Country:

Europe > Spain > Galicia > Madrid (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.05)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

Fast Computation of Graph Kernels

Neural Information Processing SystemsApr-6-2023, 15:12:05 GMT

Using extensions of linear algebra concepts to Reproducing Kernel Hilbert Spaces (RKHS), we define a unifying framework for random walk kernels on graphs. Re- duction to a Sylvester equation allows us to compute many of these kernels in O(n3) worst-case time. This includes kernels whose previous worst-case time complexity was O(n6), such as the geometric kernels of G artner et al. [1] and the marginal graph kernels of Kashima et al. [2]. Our algebra in RKHS allow us to exploit sparsity in directed and undirected graphs more effectively than previ- ous methods, yielding sub-cubic computational complexity when combined with conjugate gradient solvers or fixed-point iterations. Experiments on graphs from bioinformatics and other application domains show that our algorithms are often more than 1000 times faster than existing approaches.

fast computation, graph kernel, kernel, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Fast Computation of Highly G-optimal Exact Designs via Particle Swarm Optimization

Walsh, Stephen J., Borkowski, John J.

arXiv.org Machine LearningJun-13-2022

Computing proposed exact $G$-optimal designs for response surface models is a difficult computation that has received incremental improvements via algorithm development in the last two-decades. These optimal designs have not been considered widely in applications in part due to the difficulty and cost involved with computing them. Three primary algorithms for constructing exact $G$-optimal designs are presented in the literature: the coordinate exchange (CEXCH), a genetic algorithm (GA), and the relatively new $G$-optimal via $I_\lambda$-optimality algorithm ($G(I_\lambda)$-CEXCH) which was developed in part to address large computational cost. Particle swarm optimization (PSO) has achieved widespread use in many applications, but to date, its broad-scale success notwithstanding, has seen relatively few applications in optimal design problems. In this paper we develop an extension of PSO to adapt it to the optimal design problem. We then employ PSO to generate optimal designs for several scenarios covering $K = 1, 2, 3, 4, 5$ design factors, which are common experimental sizes in industrial experiments. We compare these results to all $G$-optimal designs published in last two decades of literature. Published $G$-optimal designs generated by GA for $K=1, 2, 3$ factors have stood unchallenged for 14 years. We demonstrate that PSO has found improved $G$-optimal designs for these scenarios, and it does this with comparable computational cost to the state-of-the-art algorithm $G(I_\lambda)$-CEXCH. Further, we show that PSO is able to produce equal or better $G$-optimal designs for $K= 4, 5$ factors than those currently known. These results suggest that PSO is superior to existing approaches for efficiently generating highly $G$-optimal designs.

artificial intelligence, evolutionary algorithm, machine learning, (3 more...)

arXiv.org Machine Learning

2206.06498

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

3 Key Machine Learning Models For Fraud Prevention

#artificialintelligenceDec-25-2019, 14:32:51 GMT

Machine learning has a clear advantage over the legacy model practices, as from various applications, fraud detection, fraud prevention, and anomaly prediction are been the foremost successful applications. As cybercriminals are becoming more intelligent in committing fraud, financial service providers in insurance, banking, money transferring apps, and E-commerce platforms are spending billions to create a protective firewall. The Machine Learning tools can help in reducing the company's cost and creating a trustworthy environment. It simultaneously increased the usage of mobile E-commerce and digital payment apps, and thus, commits of fraud. According to 2018 reports, the rate of fraudulent activities committed all over the world is 47%.

fraud, fraud detection, fraudulent activity, (11 more...)

#artificialintelligence

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)

Add feedback

Fast Computation of Wasserstein Barycenters

Cuturi, Marco, Doucet, Arnaud

arXiv.org Machine LearningJun-17-2014

We present new algorithms to compute the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is the measure that minimizes the sum of its Wasserstein distances to each element in that set. We propose two original algorithms to compute Wasserstein barycenters that build upon the subgradient method. A direct implementation of these algorithms is, however, too costly because it would require the repeated resolution of large primal and dual optimal transport problems to compute subgradients. Extending the work of Cuturi (2013), we propose to smooth the Wasserstein distance used in the definition of Wasserstein barycenters with an entropic regularizer and recover in doing so a strictly convex objective whose gradients can be computed for a considerably cheaper computational cost using matrix scaling algorithms. We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.

algorithm, barycenter, wasserstein barycenter, (12 more...)

arXiv.org Machine Learning

1310.4375

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback